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Abstract: The Marcinkiewicz Strong Law, lim — r ) ( D & — D ) = 0 

n—> oo — ^' 
nP k =l 

_ T 

a.s. with p G (1,2), is studied for outer products D & = X^X k , where 
{Xfc}, {Xfc} are both two-sided (multivariate) linear processes ( with coeffi¬ 
cient matrices ( Ci ), ( Ci ) and i.i.d. zero-mean innovations {£}, {S}). Matrix 
sequences Ci and Ci can decay slowly enough (as \l\ —> oo) that {X^, X^} 
have long-range dependence while { D &} can have heavy tails. In particular, 
the heavy-tail and long-range-dependence phenomena for {D/-} are handled 
simultaneously and a new decoupling property is proved that shows the con¬ 
vergence rate is determined by the worst of the heavy-tails or the long-range 
dependence, but not the combination. The main result is applied to obtain 
Marcinkiewicz Strong Law of Large Numbers for stochastic approximation, 
non-linear functions forms and autocovariances. 
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1. Intoduction 

_ 'J' _ 

Let D k = X k X k be random matrices with {X^}, {A*,} being Revalued (pos¬ 
sibly two-sided, multivariate) linear processes 

OO OO 

X k = ^ Ck-iZi, Xk = ^ Ck-iEi. (1) 

l— — oo l =—oo 

defined on some probability space (SI, F,P). 

{(s, = = & (1) ,...,? z (m) )) ,iez} 

are i.i.d. zero-mean random R m+m -vectors (innovations) such that Ll[|Si| 2 ] < 
oo, i?[|Si| 2 ] < oo and (Cj)z e z, (Cz)z e z are R dxm -matrix sequences satisfying 

sup |Z| <T ||C)|| < oo, sup |/| CT ||C;|| < oo for some (a, a) £ (^, l]. Hence, {D k } can 
lez zez 

have heavy tails as well as long-range dependence. 
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Linear process models are heavily used in finance, engineering, econometrics, 
and statistics. In fact, classical time-series theory mainly involves the statis¬ 
tical analysis of stationary linear processes. Current applications in network 
theory and financial mathematics leads us to study time series models where 
{Dk} can have heavy tails and long memory. Heavy-tailed data exhibits fre¬ 
quent extremes and infinite variance, while positively-correlated long memory 
data displays great serial momentum or inertia. Heavy-tailed data with long- 
range dependence has been observed in a plethora of empirical data set over the 
last fifty years and so. For instance, Mandelbrot [11] observed that long memory 
time series often were heavy-tailed and self-similar. 

The possible rates of the convergence is affected by both long-range depen¬ 
dence and heavy-tailed. There are two broad types of dependence for linear 
processes. If the coefficients (Ci) are absolutely summable and innovations have 
second moments, then the covariances of Xk are summable and we say that {Xk} 
is short-range dependence (SRD). On the contrary, we generically say that {Xk} 
is long-range dependence (LRD) if its covariances are not absolutely summable. 
Practically, by choosing appropriate coefficients, matrix sequence ( Ci ) can de¬ 
cay slowly enough (as \l\ —> oo) such that {Xk} shows LRD. We consider {Dk} 
to have LRD too in this {C/} non-summable case even though the second mo¬ 
ments for Dk may not exist. There are also two general kinds of randomness. If 
each Dk fails to have a second moment, then we say it has heavy-tailed (HT) 
and is otherwise light-tailed (LT). In our setting, Dk will either have HT or LT 
depending upon the moments of and dependence between Si and Si. 

There few general Marcinkiewicz Strong Law of Large Numbers (MSLLN) 
results for partial sums of Xk under both heavy-tailed and the long-range de¬ 
pendence and the MSLLN for partial sums of nonlinear functions of Xk is almost 
untouched. Our purpose here is to establish a method and a structure under 
which certain MSLLN for heavy-tailed and the long-range-dependent phenom¬ 
ena can be handled properly. Technically, our goal is to prove: 


lim — (Dk - D) = 0 

n—>■ oo ~ z ' 




when max supt“P(|£i^£/ | > t) < oo for some a > 1 and sup |i|°'||C';|| < oo, 
i <i,j<m t >o lez 

sup |/| <T ||C';|| < oo when (a, a) G (i, l]. This format of {Dk} is critical for our 
lez 

result since, it allows LRD and HT conditions decouple and convergence rate 
be determined by the worst of the HT requirement p < (a A 2) and the LRD 
condition p < , but not the combination. A bifurcation happens. Consider 

OO 

the summation, Dk = Ck-i^iC broken into off-diagonal and 

l,m=—oo 

diagonal terms. Due to the independence of (3 j,Hj) from (S m ,S m ), the off- 
diagonal sum Ck-iCk-m^i^m does not have heavy tails ( when a > 1 ). 

l^m 
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oo 


Conversely, since a + a > 1 the diagonal sum Ck-iCk-i^i^i does not 

I — — OO 

experience long-range dependence. In addition, the rate of convergence depends 
on the worst of ( a A 2) and 2 _ so whenever we are in the LRD dominant 
case, (a > the off-diagonal terms dictate the rate of convergence by 

the LRD effect (p < —^-^) and in the HT dominant case, (a < -—-—=), 
the diagonal terms dictate the rate of convergence by HT effect (jp < a). The 
bifurcation point is when a = ^1—= and a < 2. 

2. Background 

In this section we give a review of some existing literature on MSLLN or weak 
convergence for partial sums, sample covariance and non-linear function of par¬ 
tial sums with heavy-tailed and/or long-range dependence. Many existing results 
were only established in the scalar case. For ease of assimilation we use { Xk }, 
(q), {dk} and {^-} to denote these scalar versions of {X*,}, ( Ci ), {Dk} and 
{Sfc} and {xk+h} for {Xk} when it is a shifted version of {%}. 

2.1. Partial Sums 

There are many of publications that consider almost sure rates of convergence 
for linear processes having either LRD or HT. However, there are only a few 
like Louhchi and Soulier [10] that considered the combination of these two phe¬ 
nomena. They stated the following result for linear symmetric a-stable (SaS) 
processes. 

Theorem 1 Let z i.i.d. sequence of SaS random variables with 1 < 

a < 2 and be a bounded collection such that |cj| s < oo for some 

j'EZ 

s S [1, a). Set Xk = Ck-j€j. Then, for p e (1,2) satisfying i > 1 - i + £ 
jez 



The condition s < a ensures |cj| a < oo and thereby convergence of °k-j^j- 


Moreover, {xk} not only exhibits heavy tails but also long-range dependence if, 
for example, Cj = \j\~ a for j ^ 0 and some cr G (^, l). Notice there is interac¬ 
tions between the heavy tail condition and the long range dependent condition. 
In particular for a given p , heavier tails (a becomes smaller) implies that you 
can not have as long range dependence (s must become smaller) and vice versa. 
Moreover, this result is difficult or even impossible to apply in our outer product 
setting due to the fact that x^-’s are linear processes with SaS innovations and 
so Xk cannot be decomposed to product of two variables even in the scalar case. 
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2.2. Non-linear function of partial sums 

The limit behavior of suitably normalized partial sums of stationary random 
variables that demonstrate either LRD or HT has been subject of study by 
many authors. Applications can be found in geophysics, economics, hydrology 
and statistics. For instance, in contexts like Whittle approximation, the asymp¬ 
totic behavior of quadratic forms of stationary sequences have an important 
role. In addition, the efficacy of “i?/5-statistic” theory that was introduced for 
estimating the long-run, non-periodic statistical dependence of time series by 
Hurst and developed by Mandelbrot [12], can be confirmed by convergence of 
these limit functions. 

There are many results that deal with the existence and description of limit 
distributions of sums 


[nt] 

Sn,h(t) = y^(fo(x fc ) - E(h(xk))), t> 0, (2) 

k =1 

where h is a (nonlinear) function. The limit behavior for a Gaussian LRD process 
{xk}i firstly was studied by Rosenblatt [14]. Afterward, Dobrushin and Major 
[4] explained it in more general form. Then Taqqu [18] showed that the limit in 
distribution of particular normalized sums S n ,h{t) is determined by the Hermite 
rank to* € {1,2,...} of h(x), which is the index of the first nonzero coefficient 
in the Hermite expansion. On the other hand, the behavior of nonlinear non- 
Gaussian LRD processes is much less commonly known. One of the most studied 
models of non-Gaussian LRD processes is the one-sided linear (moving average) 
process, 


x k — y ' (^) 

3=0 

in which, innovations , k £ Z, are independent and identically distributed 
(i.i.d.), have zero mean with finite variance, and coefficients Cj satisfy: 

Cj ~ c aJ ~\ j > 1 (4) 

for some constant c a ^ 0, Co = 1 and a £ (^, 1). 

Surgailis [16] considered the limit behavior of partial sum processes S U: h(t) 
of polynomial h of linear process {xfcjfcgz- Later, Giraitis and Surgailis [5][6], 
Avram and Taqqu [1] noticed that the only difference between this case and 
Gaussian case is that the Hermite rank m* of h(x) has to be replaced by the 
Appell rank to. 

Vaiciulis [19] investigated distributional convergence for normalized partial 
sums of Appell polynomials A m (xk ) of linear processes Xk having both long- 
memory and heavy-tails in the sense EA^Xk) = oo. In particular, he assumed 
Xk had the form (3) with innovations {£]?} belonging to the domain of attraction 
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of an a-stable law with 1 < a < 2 and cj following (4). The limit was: i) 
an a-stable Levy process, ii) an m th order Hermite process, or iii) the sum 
of two mutually independent a-stable Levy and m th order Hermite processes, 
depending on the value of a, m and a where a £ (^, 1). 

Thereafter, Surgailis [17] considered the bounded, infinitely differentiable h 
case where {xk} was LRD and had innovations with probability tail decay of 
x~ 2a for 1 < a < 2. Suppose Xk satisfies (3) and (4). Then he showed three 
different limiting behaviors corresponding to three different LRD-HT setting: 
n i-( 2 a-i)m / 2 S n h {t), S nt h(t) or ti? S n ,h(t) converge in distribution to re¬ 
spectively a Hermite process of order m *, a 2aer-stable Levy process or a Brow¬ 
nian motion, all at time t, for certain range of a and cr. 


2.3. Sample Covariances 


Auto-covariance functions play a substantial role in time series analysis and 
have diverse applications in inference problems, including hypothesis testing 
and parameter estimation. The natural estimator of auto-covariance is sample 
covariance. Hence, the convergence properties of the sample covariance is of 
great interest. In the case of LRD and HT, it is an area of active research. 

Davis and Resnick [3] studied the distributional convergence of sample auto¬ 
covariances for two-sided linear processes with innovations that were i.i.d. and 
had regularly varying tail probabilities of index a > 0. 


P {Ifkl > x) 

P(£k > x) 

-PG&I > x) 


= x 2a L( x), 
p and 


P{£k < ~x ) 
P(\£k\ > x) 




as x -A- oo, 


( 5 ) 


where L(.) is a function slowly varying at infinity 


lim 

j-*oo 


L(aj) 

m 


= 1 and 0 < 


p < 1, q = 1 — p. They considered the case where the innovations had finite 
variance (i) but infinite fourth moment, i.e. 1 < a < 2 with absolutely summable 
coefficients Cj with form of (4). 

Note: We choose to scale our constants, here and in the sequel, so that a < 2 
always mean HT of the object of interest, which is XkXk+h or more generally 

X k X k . 


In case of infinite fourth moment for {£fc}fcez, the asymptotic distribution 
of normalized sample autocovariances of long-memory processes was studied by 
Horvath and Kokoszka [7]. Suppose we observe the realization xi,X 2 , ■■■, x n+v , n > 
M>o, the sample autocovariances and population autocovariances are defined 
as 


1 n 

jjf = - XkXk+h, h = 0,1,...,n, and 
k =1 

oo 

"fh = E[x 0 Xh\ = iT^cjCj+h, 


( 6 ) 





M.A. Kouritzin and S. Sadeghi/MSLLN for outer-products of Linear Models 


6 


respectively. Horvath and Kokoszka [7, Theorem 3.1] studied the asymptotic 
distribution [tJ "" 1 — 7 ^], h = 0,1,..., v for linear process of form (3) with co¬ 
efficients and innovations satisfying (4) and (5) and a norming constant a n = 
inf{x : P(|£i| > x) < n -1 } (roughly of order n^) satisfying 

lim nP[ l^/cl > a n x] = x~ 2a , x > 0. (7) 

n—> 00 

We quote this result in our notations as the following theorem. 

Theorem 2 Suppose, conditions (3), (4), (5) and (7) hold. 

(a) If 1 — -b < a < 1 and 1 < a < 2, then 


v 2 [ 7 i”> - -ft] 4 (s - *) 


E 

3=0 


Cj c j+h 


h = 0,l,...,H. 


where S is an a-stable random variable. For the above to hold for a = 3/ 4, 
we must additionally assume that a“ 4 nlnn —> 0. 


(b) If\ < cr < 1 — -b and 1 < a < 2, then 




7i]4icJ[I7„(1)], )i = 0.1,..., H 


where U a is a Rosenblatt process. The Rosenblatt process is often defined 
by the iterated stochastic integral: 


U a (t) = 2 J] 


Wl<W2<t 


Jo ( T - w i)+ a ( T - w- 2 )+ dr 


W (dwi)W (dw 2 ), 


in which W{.) is the standard Wiener process on the real line. 


This theorem works for one-sided linear processes with a regularly varying 
tail condition and gives us weak convergence. 

Notice that in Theorem 2, case (a) represents the HT dominant, (a < 2 _ 1 2a ), 
so the diagonal terms dictate convergence to an a-stable distribution. However, 
case (b) represents the LRD dominant, (a > 2 } 2(J ), hence off-diagonal terms 
take over and we get convergence to Rosenblatt process. 


3. Main results 

Our first result is in the scalar case. Later, we will extract the full vector-valued 
result as a second main theorem. All proofs are delayed until the next section 
after we have discussed the applications. 

Theorem 3 Let be i.i. d. zero-mean random variables such that 

E[t;f] < 00 , Ppq] < 00 and supf^Pd^i^-J > t) < 00 for some a > 1. Moreover, 

t> 0 

suppose (ci)zez, (cz)zgz satisfy 

sup |Z| <T |cz| < 00 , sup | Z || cz | < 00 for some a, a e (—,1 , 

zez zez \2 
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dk z 
p < 


Y Ck-iCk-m€l£m and d = -E[£i£i] Y c i°i■ Then, for p satisfying 

l,m =—oo I ——00 


75 —^-= A a A 2 

2 — a—a 


lim — 5 - (dk — d) = 0 a.s. 

n. —^no — ‘ 


TIP 


k= 1 


Remark 1 The tail probability bound ensures that -E[|^iCi| r ] < 00 f or an y 
r £ (1, (a A 2)) and E[df\ exists but it is possible that E[df\ = 00 so we are 
handling heavy tails for {dfc}. On the other hand, ^[ICiCil 0 ] < 00 implies our 
tail condition by Markov’s inequality, a, a bound the amount of long-range de- 

OO OO 

pendence in Xk = Y Ck-ifi, Xk = Y Ck-ifi- If & can be taken larger than 

l =—00 l =—OO 

00 

1, then ^2 E[ x o x k\ < 00 an d there is no long-range dependence in {x ^}. o > \ 
k= 1 

00 

with -E[^i] < 00 ensures that Y c k-ifi converges a.s. 

I — — OO 


Remark 2 Notice that the constraints to handle long-range dependence, p < 
2 _ and to handle the heavy tails, p < (a A 2), decouple. This decoupling 
appears to be due to the structure of dk- Due to the independence of (fi,fi) from 
the off-diagonal sum Y c k-lCk-mflf m does not have heavy tails. Con- 

l^m 

oo 

versely, since ct + ct > 1 the diagonal sum Y c k-iCk-ififi does not experience 

1 — — 00 


long-range dependence. 


We will give a simple example to verify conditions in Theorem 3. Recall, a 
non-negative random variable f obeys a power law with parameters f3 > 1 and 
£min > 0, written f ~ PL(x m i n , ft), if it has density 


/O) = 


( 3-1 


(——) d V x > x„ 


°min -°rmn 


X r min(^Yh) T<(3- 1 

oo r > (3 — 1 

It has a folded t distribution with parameter f3 > 1, written f ~ Ft(/3), if it has 


so E\f\ r = 

It has a 
density 


/(*) = 


2 T(|] 


\ (0-1) 


1 + 


V x > 0 


so -E(|£| r ) exists if and only if r < [3 — 1. 

Example 1 Suppose p,q,a,j3,j3 > 1 are such that I + 1 = 1, f3 > pa + 1, 
f3 > qa+l and pa, qa > 2. If f\ and have power law distribution, lets say ~ 
Pl(x min ,/3), ~ Pl(x min ,/3) for some a; m in,Smin > 0, then E[ff], E[^] < oo 
and supi a P(|a?i| > t) < oo. Iff 1 ~ Pt(/3), ^ ~ fen£[^], E[£i] < oo 

t> 0 
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and supt^Pd^i^J > t) < oo. Either way, the Theorem 3 applies with properly 
t> o 

chosen (q,q). 

We now consider the case where X k and X k are (multivariate) linear pro¬ 
cesses. 


Theorem 4 Let {3;} and 
that Ei = 5 


'I = 


1 } be 

-w 


i.i.d. zero-mean random 




max sup t' 

l<i,j<m t>0 




vectors such 
> t) < 


oo for some a > 1, P[|Si| 2 ] < oo and P[|Si| 2 ] < oo. Moreover, suppose matrix 
sequences (Ci)i s z,(Ci)i^z G R dxm satisfy 


sup |/ri|G|| < OO, sup ^rilOII < OO for some (<T,<r)eP,l , 
iez ze z \* . 


Xk, Xk take form of (1), Dk = XkX k and D = E[XiX 1 ]. Then, for p satisfy¬ 
ing p < 2 -l-W A a A 2 


1 n 

lim —j- V (D k — D) = 0 a.s. 

n ^°° n p fe=1 


This theorem follows by linearity of limits and Theorem 3. 


3.1. Applications 

We give some applications of our theorems. 

3.1.1. Stochastic Approximation 

Stochastic approximation (SA) is often used in optimization problems for linear 
models. Hence, the convergence properties of SA algorithms driven by linear 
models is of utmost interest. For illustration, we assume {z k ,k = 1,2,..} and 
{y k ,k = 2,3,...} are respectively R d — and R—valued stochastic processes, de¬ 
fined on some probability space (fl,F, P), that satisfy 

Uk+i = z k h + e k , Vfe = 1,2, —, ( 8 ) 

where h is an unknown d-dimensional parameter or weight vector of interest and 
{e/c} is a noise sequence. We want to estimate the parameter vector h through 
the stochastic approximation algorithm: 

h k -(-1 — h k p k {b k A k h k ), (9) 

where p k is the fc th step gain of the form p k = k~ x for some \ G (^,l], 
A fc = z k z% and b k = y k+ \z k . 

Kouritzin and Sadeghi [9] studied the convergence and almost sure rates of con¬ 
vergence for the algorithm (9). Now, we can combine our main result (Theorem 
4 ) with [9, Corollary 2] to obtain a powerful rate of convergence result for 
stochastic approximation. 
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Theorem 5 Let {S;} be i.i.d. zero-mean random R m -vectors such that 

sup£ a jP(|£i| 2 > t) < oo for some a £ (1, 2) 

£>0 

(Ci)lez be R( d+1 ) Xrn -matrices such that sup |Z| CT ||Cz|| < oofor somecr £ (|,l] ; 

lez 

oo 

(ZkiVk+ l) T = JZ 

l =—OO 


A k = z k zl and b k = y k +iz k and A = E[z k z^} and b = E[y k+1 z k \. 

Then, \h n — h\ = o(n -7 ) as n —> oo a.s. for any 7 < 7 g X ' ) = (y — T) A(x+2u — 2). 


Proof. By Theorem 4 when ± = x - 7, X k = Xf = (zf,y k + 1 ), 3j = Sj, 

ZkZ k 2/fe+i2:fc 

Uk+\z k y‘i+ 1 


Ci =Q, a = a , and D k = 


1 


n*-T 


— — J9) —>■ 0 a.s., 


fc=i 


A 


where D = ( , T , 2 1 ) ■ The first d-rows of i _ 7 X] (-Dfc — D) —> 0 a.s. 

V b ^[Vk+ll ) k= 1 

then establish the MSLLN 


-A)-H3 and —JZ ( 6 fc - &)-> 0 a - s ' 


nX-T 


fc=i 


fc=i 


Now, we apply [9, Corollary 2] to complete the proof. □ 

Remark 3 Note that x ~ 7 satisfies the required conditions \ ~ 7 > 2 — 2er and 
% — 7 > T in Theorem f. Theorem 5 also appears in [9, Theorem 7]. 


3.1.2. Non-linear Function of linear processes 

As mentioned in Background, Vaiciulis [19] showed the convergence of distribu¬ 
tions of the partial sum processes with non-linear h(x k ) in terms of convergence 
of Appcll polynomials A m (x k ) of a long-memory moving average process {x k } 
with i.i.d. innovations {£&} i n the case where the variance EA^ixk) — 00 , and 
the distribution of belongs to the domain of attraction of an a-stable law 
with 1 < a < 2 . 

Practically, the simplest examples of functions h(x) with a given Appell rank 
m are Appell polynomials h = A m relative to the marginal distribution x\ of the 
linear process (3). In case m = 2 the Appcll polynomial is A. 2 (x) = x 2 — y .2 where 
y .2 = Ex 2 . Viaiciulis [19, Theorems 1.1 and 1.2] proved that when m(2cr—1) < 1, 
m > 2 and a £ (i,l) the limit distribution of partial sums of m th Appell 
polynomial is either (i) an a-stable Levy process for 2 — 2<r < 1 + — ( — ~ 1), or 
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(ii) an m th order Hermite process for 2 — 2er > 1 + ^(— — 1) or (iii) the sum 
of two mutually independent processes depending on the value of a , m and cr, 
for 2 — 2 cr = l + - 1 ). 

Taking into account all his conditions ( when t = 1 ) and transforming it to 
our case we write our complementary almost sure rate-of-convergence theorem. 

Theorem 6 Suppose A 2 represents the Appell polynomials with rank 2 relative 

OO 

to the marginal distribution x\ of the linear process X). = E ck-jtj, f° r p e 

3 =0 

[ 1 , 2 -2a ^ a ) w ^ en 

supt“P(^ > t) < 00 for some ae(l, 2 ), ( 10 ) 

t> 0 


sup \l\ a \ci\ < 00 for some a £ I —, 1 




( 11 ) 


Then, 


lim 

n—>00 


1 n 

— '^2,A 2 (x k ) = 0 


a.s. 


k =1 


One might wonder if we have obtained the best possible MSLLN. Indeed, we 
have. For example for m = 2, Viaiciulis [19] shows convergence in distribution 

n 

of A 2 (x/.) to different non-trivial limits in cases (2 — 2 <r) > ^ 

(LRD dominant) or (2 — 2a) < — (HT dominant), respectively. Therefore, 
1 n 

— A 2 {xk) cannot converge to zero almost surely. Theorem 6 gives 


(2-2<t)A 


fe=l 


MSLLN for Appell polynomials with rank 2 or in other word gives the conver¬ 
gence and almost sure rates of convergence for partial sums of second Appell 
polynomial when 1 > (2 — 2<r) V i. Our result is optimal in polynomial sense 
and we cannot do better than that in terms of MSLLN. 


3.1.3. Autocovariances 

As mentioned in the background, autocovariance estimation under HT and LRD 
conditions is an active area of research. We will handle the asymptotic behavior 
of sample covariance function for processes with LRD, innovations of infinite 4 th 
moment and finite variance t. If we define the sample aurtocovariance and pop¬ 
ulation autocovariance functions by 7 ( n \h) and 7 (h), as ( 6 ), we have following 
almost sure result. 

OO 

Theorem 7 Assume 7 ( n \h) and "f(h), as (6) in which Xk = an d 

3 =0 

satisfies (10) and (11) with E[(\] = l. Then for p satisfying p < 2 - 2a A a A 2 
nl ~* [fF - lh] -t 0 a.s. ( 12 ) 
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Proof. Note that in Theorem 3, for case E{£f] = i, ci = ci + h and 

{ci = 0, V l < 0} we have 

k k-\-h oo 

dk = ^ ^ ^ ^ C-k—lCk+h—mfilfim ^Ild (I, = 

l=—oo m=—oo l —0 


Hence, 


^ n ^ n / k k-\-h oo \ 

~ ^ ^ d') = Y" ^ ^ ^ ^ ^ Ck—lCk-\-h—mfil£im ^ ^ ^ QQ+/i(j-3) 


Tl p 


k—1 n P k—1 \l =—oo m ——oo 

On the other hand, (12) can be written as 

k=l 


z=o 


n ! P [^ n) - 7/i] = — XkXk+h ~ Ex 0 x h ) 


— ^ ^ ^ ^ ^ ^ C-k—l^k+h—m^l^m ^ ^ ^ QQ+/t 

^ P /c—1 \l——oo m= — oo l —0 y 


UP 
k k-\-h 


(14) 


So, the result follows.□ 

As we saw, Theorem 2 gives the convergence to the following non trivial limits 
for 2 ^~ 1 < cr < 1 and ^ < cr < 2| ^~ 1 when 1 < a < 2, 

OO 

E cici + h , 

. 2=0 

1 d 
(b) 9 - 9 ^ y^{XkXk+h - Ex 0 X h ) 2.C 2 [Ucrjl)} , 

fc=l 


(a) — - Ex 0 x h ) A 

Un fc =i 


5- 


a — 1 


respectively, for h = 0,1,..., v. 

It is clear that in the case of HT dominant, — > 2 — 2tr, we have almost sure 
convergence (Theorem 7) when ^ When | = ■i, we get into the case (a) 
and have convergence to an a-stable distribution. On the other hand, in the LRD 
dominant case, ^ < 2 — 2<r, from Theorem 7) we have almost sure convergence 
for - > 2 — 2cr, yet for - = (2 — 2a) we have convergence to Rosenblatt process 
by (b) . 

Hence, Theorem 7 shows the a.s convergence for difference of sample autoco¬ 
variance and population autocovariance with HT and LRD. One example can be 
in the case that h = 0. Theorem 2 and (15) give the convergence in distribution 


fc=l 


£ E(*fc - E4) A(S- ^y) E 


2=0 


1 " 


l M 1 ), 


fe=1 
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for 1 = - and - = 2 — 2a, respectively. 

pap 7 L J 



when i > (2 - 2 a) V 


When we have convergence in distribution to non-trivial limits we can not get 
almost sure convergence to 0. However, by Theorem 7 we can get arbitrary close 
to that with polynomial rate and get optimal polynomial almost sure rate of 
convergence. We can not do better than that in terms of MSLLN. 

4. Proofs 

4-1. Notation List 

|x| is Euclidean distance of some R d -vector x. 

IICH = sup M=1 |Cx| for any R nxm -matrix C. 

[tj = maxji £ No : i < t} and \t] = min{* £ No : i > t} for any t > 0. 

Oi,fe *C bi t k means that for each k there is a Ck > 0 that does not depend upon 
i such that |a^fcj < Ck\bi t k | for all i,k. 



l=p 

a V b = max{a, 6} and a A b = min{a, b}. 

4-2. A First Light Tail Result 

We first give a result that only handles long-range dependence without heavy 
tails. However, our proof of Theorem 3 to follow will show that these two phe¬ 
nomena decouple, so we can easily build upon the Theorem 8 to handle both 
long-range dependence and heavy tails together. 

Theorem 8 Let {(&,£;), l £ Z} be i.i .d. zero-mean random variables such that 


E[(l + £i)(l + £i)] < oo, (ci,ci)i e z satisfy 



OO OO oo 


Ck—l£,lj %k — Ck—l£,i? dk — %k%k — ^-k—l^k—m^l^m CLI^d 



oo oo 


d = #[fi£i] E c k -iCk-i = £[£i£i] E C 1 ° 1 • Then > forp< 2 _l_,= 


2—cr—cr 


1 




a.s. 
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Proof. . Insomuch as the proof of the general case only differs cosmetically from 
the notationally-simpler case where and ci = ci = ' ^ ^ 


1^0 


we 


only provide the proof of the later for which the constraint becomes p < 2 _ 1 9g . . 
Assume without loss of generality that a < 1 and E[£f] = 1. 

Step 1 : Divide partial sums into diagonal, large c, small and mixed type 
terms. 

Let n r = 2 r and T = T ( n ) = n" for v > 0, n £ [n r ,n r +i) and r £ No, and 
define 


n oo 


c(i) 

= EEtlM) 

fc—1 l=—oo 

n k+T 

(15) 

q( 2) 

= ^ ^ ^ ^ Ck—lCk—m^l£,m 

k= 1 l,m=k-T 
l^m 

n 

(16) 

3) 

D n 

= ^ ^ ^ ^ c k—l c k—m^l^,m 

k=l (l-k)A(m-k)>T 

n k-\-T 

(17) 

C(4) 

^ ^ ^ ^ ^ ^ Ck-lCk-m^il £ra• 

/c—1 m—k>T l=k—T 

(18) 


By breaking < -V X] (^fc — d), n = 1,2,... > into pieces and considering those 

k= l J 

pieces with different (process) distributions, we just need to show that 


lim 

n—> oo 



UP 


lim 

n—too 


( 2 ) 


s_ 

i 

UP 


lim 



1 _ 

UP 


sl 4) 

lim —— = 0 a.s., 

n->oo n p 


provided p < 2 _ 1 2a ■ To handle (the diagonal terms) Sn\ we let Ci = — 1, set 

K = -E[Ci] and use standard steps. 

Step 2: Bound second moment of geometric diagonal partial sums Snj . 

By symmetry and then integral approximation, we have that 


£[(siV) 2 ] 

oo oo n r n r 

= E E 

— oo j=1 k =1 
n r 2 

E c *-* 


OO 

= K E 

l= — oo 


n r / oo n r ( k — l 

<< E i + 2£r 4 - + 2 e Uj-k)-^+ 

k= 1 y 1=1 j=k-\- 1 \ l= — oo 
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3 — 1 


+ e (i-kr 2 tT (j-ir 2a + E 


l=k+1 


1=3 + 1 


< E 1 + E (O’ - k ) 2a + O' - fe ) 1_ 

fc=i V j=*!+i 


•C Ur. 


Note: 


j-i 

E 

l=k +1 


(i - fc) 2CT 0 - 0 2CT ' 




<< U-kf-^. 


\_+r\ 

E 

Z=fc+1 


(i - fc) 2ff O' - 0 2CT 

I J+ fc 


m 

<< o - *r 2CT E 


Z=fe+1 


(l-k) 


2a 


Step 3 : Maximal bound for geometric diagonal partial sums. 
Following (19) we have for n r < n < o < n r +\ 


E[{S « - S«) 2 ] < K E 


l = — C 


E 


fc=n+l 


(19) 


( 20 ) 


<< E h+ E (ti-k)~ 2 ° + (j-ky- 4 n 


k=n+1 

o,n 

-C o — n. 


j=k +1 


( 21 ) 


Therefore, it follows by Theorem 2.4.1 of Stout [15] with g(a : n) = Cn for some 
constant C > 0 that 


E 


max 

n r <n<o<n r _|_i 


o ’-'n 


r / log(2(n r+ i - n r )) 

V log 2 

r 2 
<C r n r . 


+r +1 ^r) 

( 22 ) 


Step 4: Use previous two steps to show normalized diagonal sums converge. 
Combining (19) and (22), one has that 


J2 e 

r—0 


S, 


(i)' 


max 

n r <n<n r _|_i 


<c E r2rir p < °°i 


(23) 


r*=0 


provided p £ (0, 2). It follows by Fubini’s Theorem and term divergence 
that 
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Step 5 : Set up for off-diagonal terms. 
Letting 


2 ,n 
a l,m 


3,n 

h,m 


4 ,n 
a l,m 


2 ^ lm-T<fc<Z+TC/c-ZCfc-ri 
Zc=l 


2 ^ ^ lfe<Z— T^-k—l^k—m 
k= 1 
n 

E lfc<m-Tli-T</c<;+TCfe-ZCfc_ m , 


(24) 

(25) 

(26) 


fc=i 


we find that 


£ 


O ^ 0 ) 5 


oo oo 


oo oo 


X. X a il,mi X X/ a h,rn 2 E 

Zi=—oo mi=Zi + l l 2 =—oo m 2 —Z 2 +I 


00 00 00 00 

Zi =—00 mi=Zi + l Z 2 — — 00 m 2 —Z 2 +I 

00 00 


E E »i:“ 


E E (»:;:)' 


(27) 


l =—00 m—Z+l 

and for n r < n < o < n r +\ 


OO OO 


£ 


(S<‘)-S<‘>) ! ] = E E (<;-«;;;)' 


(28) 


Z= —00 m=Z+l 


for i = 2,3,4. Using a change of variables and the Beta distribution pdf, we 
have that 


X c j—i c k—i < f {3 - t) ° {t - k) dt 

l=k +1 "' fe 


= a - k ) 1 - 2CT / (1 - S )- CT < (i - fc) 1 -^ . (29) 


J.fc ,. 


\ 1 — 2(7 


5(1—<x,l — er) 

m 5 ( 2 ) 

Step 6: Apply S 1 ^- 1 -procedure for convergence of large c terms — t-. 

nP 

Using (29) and integral approximation, one has for n £ [n r ,n r +i) 

E 


T<l<k+TC k _ t C k _ ri 


(S< 2 >) 2 J- 4 EE m<k+T ■ 1 k-T<l< 

k= 1 m>Z 

= «XX lj—T<m</c+T ‘ ]-j—T<l<.k-\-TCj—lCj—mCk—lCk—m 

j>k m^l 
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<*E 

j>k 


k+T 

E Cj-lCk-l 

l=j—T 


k -1 j-1 k+T 

2 Cj-k + E C j—l c k—l + E c j—l c k—l + E c j~l c k~l 

l=j—T l=k -\-1 / =j +1 


n nA(/c+2T) 

^ 4 E e 

k—1 j=k -\-1 

n k-\-2T 

<<EE [c?- k)~ 2a + u -fc) 2 - 4CT + u -fc)- 2 CT t 2 - 2ct 

fc—1 J —/c+l 
n 

nl(n), 


where l (n) 


^3—4(7 _ ^^(3—4cr) 

log (T) = ^ log(n r ) 

1 


<7 < 
(7 = 
<7 > 


. Hence, 


E 


(E 2) ) 2 1 <knl(ri) + 


E 

fc=i 


E 

1——T 


c l 


it 

nl (n). 


Similarly, we have for n r < n < o < n r _|_i that 


(30) 



/ \ 2" 

o 

T 

2 

o 

2 

/c+T 

E 

(s?> - S«) 

°1 n V—> 

« E 

k=n -\-1 

e«* 

l=—T 

+ E 

< 7,fe=n+l 

E Cj-lCk-l 

1=3 ~T 


j>k 


<C (o — n)l (n). 

Therefore, it follows by Theorem 2.4.1 of Stout that 


(31) 


E 


max 

n r < n < o < n r _|_ i 


^ Z' log(2n r ) 


V log 2 

■C r 2 n r l{n r ). 

Combining (30) with n = n r and (32), one has that 

2~ 




(32) 


E 


y max 

' ^ n r <n<n r _i_i 
r=0 


j(2)' 


<C E" 2 ^ p l(n r ) < oo, (33) 

r—0 


provided 1 + j/( 3 — 4cr) V 0 < | (i.e. p < 1+iy ( 2 _ 4a ) when cr < | and p < 2 when 
cr > |, both of which are true). It follows that lim E" = 0 a.s. 

4 n—too n p 
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5 ( 3 ) 

Step 7: Apply S'^-'-procedure for convergence of small c terms —V. 

71 P 

E [(S^) 2 ] 

=*ee 1 j+T<l ■ lk+T<lCj-lCj 

j>k m>l 
n 

+ 4 EE 1 fc+T<iC 2 _;C 2 _ m 

fc=l m>Z 



OO 

2 

n 

00 

^E 

E Cj-lCk-l 

+ ^E 

E C k-l 

j>k 

l=j+T+ 1 

fc=i 

Z=fc+T+1 


n—1 n 

<<£ £ 

fc=l J = fc +1 

n ( n 

<<E ( E 

/o=i \ j=fc+i 


O+T 


(i-j) a (t-k) a dt 


t~ 2G dt 


+ 


E 

/c=l 
2 


/fc+T 


(f — jfc) 2<T dt 


t~ 2a dt 


<C n 2 T 2 ~ 4a . 

Similarly, we have for n r < n < o < n r .|_i that 


(34) 


£ 


(E 3) - ^ 3) ) 2 (o - n) oT 2 ~ 4a (o - n) n 4 + 4 (2 ~ 4CT) . (35) 


Therefore, it follows by Theorem 2.4.1 of Stout that 


E 


max (s^—S^') 

n r <n<o<n r _)_i \ / 


r / log(2n r ) 

1 log 2 

k r 2 n 2 +^ 2 - 4 °\ 


^ (n r+ 1 - n r )n 4 ^i (2 4(t) 


(36) 


Combining (34) with n = n r and (36), one has 


E 


y max 

* ^ n r <n<n r _|_i 
r—0 


A3)' 


2 2+i/(2-4<t)-| 

■C < 00 , (37) 


r—0 


provided p < 






whirTi is flip 




It is notable that condition on p, p < 1+i/ ( 2 _ 4o .) , in step 6 gets more stringent 
when v > 1 and the same is true for condition on p, p < i n step 

7 when v < 1, so the best choice that raises the same condition on p is when 
v = 1. Hence, we will have to satisfy p < -j—h_ in. either cases. 
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Step 8: Apply S'^-procedure for convergence of mixed terms —t-. 

n P 

Finally, we note 


E 


[(5i 4) ) 2 ] 


n oo l=k-\-T n k-\-2T oo k-\-T 

-E E Ck — m E Ck-l + 2 EE E Cj — m C-k — 7Ti ^ ^ Cj _ / Cfc _ / 

fc=l m=fc+T+l l=k — T k=l j=k-\-l m=j-\-T-\-l l=j — T 

n ( k+2T \ 

< y, i Tl ~ 2a + E t1_2ct [O' - fc )' a + 0 - + 0 - *0'" ^ 1_ 1 > 

fc=l I j=fc+l I 

<< nT 3 " 4 A 

Similarly, we have for n r < n < o < n r +i that 

2 " 


E 




•C (o - n) T 


i3—4cr 


Therefore, it follows by i/ = 1 and Theorem 2.4.1 of Stout that 


E 


max (s^ — S^') 
n r <n<o<n r _|_i V / 


r / log(2 n r ) 
V log 2 


(n r +1 - « r 2 n 4 4<T . 


Combining these two equations, one has 


E 


oo / „(4)' 

E l On 

max —— 

r=Q n r <n<n r+1 ^ n p , 


<C ^^r 2 ni ^ p < oo, (38) 


r=0 


S< 4 > 


provided p < „_ 2 , which is true. It follows that lim —V = 0 a.s. □ 

z zo n—too n p 


4-3. Proof of Theorem 3 


Without loss of generality we assume 1_< a < 2. 

Step 1: Reduce to continuous {(£;,£;)}• 

Let {(Ui)}i£z be independent [—1,1]-uniform random variables that are inde¬ 
pendent of everything and set Ui = Ui for all l. Then, we have that 


1 " 

—E(*- d ) 

nr k =1 


However, 


^ n oo 

Y" ^ 2 ^ ^ Ck — lCk—m 

r ft jP k=ll,m= — oo 
^ n oo 

~ ^ ^ ^ ^ Ck — lCk—m 

'ft jP k=ll,m= — oo 


fern + Pif m + IWm - -(39) 


lim ■ 

n—»oo , 


tE e 

^ /c—1 l,m =—c 


C-k—lC-k—m ( £,lUrn H - 



(40) 
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by Theorem 8. Moreover, £1 + Pi, ^ + U\ have the same moment and tail prob¬ 
ability bounds as £ 1 ,^. Hence, without loss of generality, we can assume 
are continuous random variables, which will be important for the truncation to 
follow in Step 4. 

Step 2: Handle off-diagonal sum as previous proof since unaffected by heavy 
tails. 

Suppose Sn \ Sn' 1 and S^ are defined as in (16-18). Then, we know that 


lim 

n—too 



i_ 

UP 


lim 

n—too 


(3) 


s^_ 

i 

TIP 


(4) 


lim 

n—>oo 


S, 


nr 


= 0 a.s., 


provided p < 2 _*_- by the proof of Theorem 8. 

Step 3 : Reduce (in diagonal sum) to non-negative with single atom at 

0. 

Noting 


oo 

y: ck-ick~i{£i£i —-£[&?;]) 

l= — oo 

oo oo 

= X^ c fc-iCfc-i(te?i) + - £[(6?i) + ]) - X] - ^[teli) - ]^ 1 ) 

l= — o o l= — o o 


we only have to consider the case where > 0 for the remainder of the proof. 
Moreover, insomuch as the proof of the general case only differs cosmetically 
from the notationally-simpler case where = £i, P[£ f] = 1 and Ci = ci = 

}n-cr / ^ o ’ We on ^ P rov id e the proof of the later for which the long-range 


dependence constraint becomes p < 


2-2 a 


. We will however indicate the most 


significant changes that would be needed for the general case. 

Step 4 : Divide diagonal terms into zero-mean truncated (i.e. bounded) and 
remainder pieces. 

Let k > 0. Fix ut = nr~ a to find 


2 / P(£ 2 > s)sds -C 2 / ss~ a ds <C n” Vr = l,2,... 

Jo Jo 


(42) 


Now, by defining 


C* = Ci = (C 2 A u+) - Vi, where = /“" P(£ 2 > s)ds < 1, 

0 = C[ = c 2 -i-cl, 


(43) 


we find that 


/ 


£[Cd = / > t)dt - / P(C 2 > t)dt = o, 


(44) 
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so both C,i and Q are zero mean, and by (42) 


E[ ICrl 2 ] = E\£ i 2 A- 


u+ \ 2 

^ P(£?>i)di] 


\J0 J 

, + / + \ 2 


= 2 J P(£i > s)sds - yj P(£ 2 > t)dtj <C nj? V r = 1,2(45) 

(In the general case, we note that £ 1 ^ is non-negative and of continuous dis¬ 
tribution on (0, 00 ) so P[£i£i A it+] = /“ r P(^iCi > s)ds as required. We also 

haveCI = tei-E[& i ]-Z.) 

Step 5 : Moment Bound for truncated using the proof of Theorem 8. 

Noting {(A} are i.i.d. with P[Ci] = 0 and If[Ci] < 00 and defining 


s‘‘> = E E 4-<c„ 


k—11 =—oo 

one finds from (23) in the proof of Theorem 8 that 

, 2 " 


E 


max 

n r <n<.n r +1 


(s<‘> 


Hence, it follows by (45) that 


E 


max 

n r <n<n r +i 




< S|Ci|Vn r . 


< r 2 n). +K . 


(46) 


(47) 


(48) 


Step 6: Moment Bound for remainder using Doob’s inequality. 
Turning to the Q and using the formula 

poo pO 

E[g{X)} = / g'(t)P{X > t)dt - / g\t)P(X < t)dt , 
J 0 J —oo 


(49) 


one has by our tail probability bounds that the non-negative part of satisfies 


P|CiT = T / s T P(£j > it+ + s + 1 - z?i)ds 


< T 


< 


< 


p OO 

/ S T_1 P(^ > + s)ds since #1 < 1 

Jo 

poo 

/ (a - it+) T " 1 ir“ds 

Ju+ 

p2uf poo 

/ (s - u+) r_ 1 <is(u+) _a + / (s - w +) T- “~ 1 

Jut J2ut 


ds 


, + ^-a^ >— _ 




(50) 
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for 1 < r < a. Therefore, it follows by Jensen’s inequality and Doob’s L p 
inequality that 


Er 

sup 

n oo 

E E 

r- 

Hit- 

Kl 

VI 

oo 

E sup 

n 

E&-1 

T- 


n r <n<n r +i 

k—l l=—oo 


OO 

, n, r <n<n r +i 

L=—oo — 1 

fe=i 

n r 

- 


<< E ^ 

I — — OO 
oo 

<< E 


l=—c 


sup 

i r <n<n r _|_i 
n r + i — l 

E 

k =1 


ES-« 


k =1 


< «r||Cl||r, 


so by (50,51) 


£ 


sup 

n r <n<n r _|_i 


n oo 


E E 


/c—1 l=—c 




Tl r 


(51) 


(52) 


Step 7: Use Truncation and Error Term bounds with Borel-Cantelli for con¬ 
vergence. 

Combining (48) and (52), one has that 


sup 

n r <n<n r _|_^ 


n oo 


E E c ^ k ~ i 

k =1 l =—oo 


> 2 enf 


< 




n oo 

2 1 



n, oo 

T- 

E 

sup 

E E cK k -i 


£ 

sup 

E E c lCk-i 



nr<n<n r -)-i 

k= 1 l= — oo 



rtr- <n<n^_)_i 

k= 1 1= — oo 




2 


- + — 


T 

— 


V o 1 + K— — T ?:- 

<C r n r p + n r a p 

< r 2 n r p + n r p , 

by letting k = Hence, if r € (l, ^), then 

oo / n oo 

su p EE c ^ fc - 

r=i \nr-<n<nr +1 k=1 l= _ QO 

under our heavy-tail condition p < a and 




> 2 enf < oo, 


(53) 


(54) 


n oo 


n p 


lp E E ^ 0 a - s -’ 


( 55 ) 


k—1 l=—oo 


by Borel-Cantelli. The proof is complete. □ 
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